estimation problem
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy. Importance sampling (IS) has been a key technique to derive (nearly) unbiased estimators, but is known to suffer from an excessively high variance in long-horizon problems. In the extreme case of in infinite-horizon problems, the variance of an IS-based estimator may even be unbounded. In this paper, we propose a new off-policy estimation method that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance issue faced by existing estimators.Our key contribution is a novel approach to estimating the density ratio of two stationary distributions, with trajectories sampled from only the behavior distribution. We develop a mini-max loss function for the estimation problem, and derive a closed-form solution for the case of RKHS. We support our method with both theoretical and empirical analyses.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > California > Riverside County > Riverside (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Virginia (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Research Report (0.46)
- Workflow (0.45)
- Information Technology > Data Science (0.93)
- Information Technology > Communications (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Virginia (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Information Technology > Data Science (0.93)
- Information Technology > Communications (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > France (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Information Technology > Security & Privacy (1.00)
- Law (0.68)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Europe > France (0.05)
- North America > United States (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- North America > United States > California > San Diego County > San Diego (0.05)
- North America > United States > California > San Diego County > La Jolla (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.56)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Middle East > Cyprus (0.04)
- North America > United States > Florida > Hillsborough County > University (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)